30 research outputs found
Stochastic Query Covering for Fast Approximate Document Retrieval
We design algorithms that, given a collection of documents and a distribution over user queries, return a
small subset of the document collection in such a way that we can efficiently provide high-quality answers
to user queries using only the selected subset. This approach has applications when space is a constraint
or when the query-processing time increases significantly with the size of the collection. We study our
algorithms through the lens of stochastic analysis and prove that even though they use only a small fraction
of the entire collection, they can provide answers to most user queries, achieving a performance close to the
optimal. To complement our theoretical findings, we experimentally show the versatility of our approach
by considering two important cases in the context of Web search. In the first case, we favor the retrieval of
documents that are relevant to the query, whereas in the second case we aim for document diversification.
Both the theoretical and the experimental analysis provide strong evidence of the potential value of query
covering in diverse application scenarios
Blake, Charles
Caching search results is employed in information retrieval systems to
expedite query processing and reduce back-end server workload. Motivated by the
observation that queries belonging to different topics have different
temporal-locality patterns, we investigate a novel caching model called STD
(Static-Topic-Dynamic cache). It improves traditional SDC (Static-Dynamic
Cache) that stores in a static cache the results of popular queries and manages
the dynamic cache with a replacement policy for intercepting the temporal
variations in the query stream. Our proposed caching scheme includes another
layer for topic-based caching, where the entries are allocated to different
topics (e.g., weather, education). The results of queries characterized by a
topic are kept in the fraction of the cache dedicated to it. This permits to
adapt the cache-space utilization to the temporal locality of the various
topics and reduces cache misses due to those queries that are neither
sufficiently popular to be in the static portion nor requested within
short-time intervals to be in the dynamic portion. We simulate different
configurations for STD using two real-world query streams. Experiments
demonstrate that our approach outperforms SDC with an increase up to 3% in
terms of hit rates, and up to 36% of gap reduction w.r.t. SDC from the
theoretical optimal caching algorithm
Caching Historical Embeddings in Conversational Search
Rapid response, namely low latency, is fundamental in search applications; it
is particularly so in interactive search sessions, such as those encountered in
conversational settings. An observation with a potential to reduce latency
asserts that conversational queries exhibit a temporal locality in the lists of
documents retrieved. Motivated by this observation, we propose and evaluate a
client-side document embedding cache, improving the responsiveness of
conversational search systems. By leveraging state-of-the-art dense retrieval
models to abstract document and query semantics, we cache the embeddings of
documents retrieved for a topic introduced in the conversation, as they are
likely relevant to successive queries. Our document embedding cache implements
an efficient metric index, answering nearest-neighbor similarity queries by
estimating the approximate result sets returned. We demonstrate the efficiency
achieved using our cache via reproducible experiments based on TREC CAsT
datasets, achieving a hit rate of up to 75% without degrading answer quality.
Our achieved high cache hit rates significantly improve the responsiveness of
conversational systems while likewise reducing the number of queries managed on
the search back-end
How future surgery will benefit from SARS-COV-2-related measures: a SPIGC survey conveying the perspective of Italian surgeons
COVID-19 negatively affected surgical activity, but the potential benefits resulting from adopted measures remain unclear. The aim of this study was to evaluate the change in surgical activity and potential benefit from COVID-19 measures in perspective of Italian surgeons on behalf of SPIGC. A nationwide online survey on surgical practice before, during, and after COVID-19 pandemic was conducted in March-April 2022 (NCT:05323851). Effects of COVID-19 hospital-related measures on surgical patients' management and personal professional development across surgical specialties were explored. Data on demographics, pre-operative/peri-operative/post-operative management, and professional development were collected. Outcomes were matched with the corresponding volume. Four hundred and seventy-three respondents were included in final analysis across 14 surgical specialties. Since SARS-CoV-2 pandemic, application of telematic consultations (4.1% vs. 21.6%; p < 0.0001) and diagnostic evaluations (16.4% vs. 42.2%; p < 0.0001) increased. Elective surgical activities significantly reduced and surgeons opted more frequently for conservative management with a possible indication for elective (26.3% vs. 35.7%; p < 0.0001) or urgent (20.4% vs. 38.5%; p < 0.0001) surgery. All new COVID-related measures are perceived to be maintained in the future. Surgeons' personal education online increased from 12.6% (pre-COVID) to 86.6% (post-COVID; p < 0.0001). Online educational activities are considered a beneficial effect from COVID pandemic (56.4%). COVID-19 had a great impact on surgical specialties, with significant reduction of operation volume. However, some forced changes turned out to be benefits. Isolation measures pushed the use of telemedicine and telemetric devices for outpatient practice and favored communication for educational purposes and surgeon-patient/family communication. From the Italian surgeons' perspective, COVID-related measures will continue to influence future surgical clinical practice
USI Participation at SMERP 2017 Text Summarization Task
Abstract. This short report describes the participation of the UniversitĂ della Svizzera italiana (USI) at the SMERP Workshop Data Challenge Track for the task text summarization of Level 1. Our participation is based on a linear interpolation for combining relevance and novelty scores of the retrieved tweets. Our method is fully automatic. For the relevance score we used the results from our runs at the text retrieval task whereas for the novelty we used a method based on Word2Vec. In total, we submitted four different runs and we used two different weight parameters. The results showed that when relevance and novelty have an equal contribution in selecting the tweets to use for the summary, the performance is better compared to favoring only the novelty. Additionally, information from POS tags improves the performance of the summarization task
Cache Optimization Via Topics in Web Search Engines
Embodiments may provide a cache for query results that can adapt the cache-space utilization to the popularity of the various topics represented in the query stream. For example, a method for query processing may perform receiving a plurality of queries for data and requesting data responsive to at least one query from a data cache comprising a temporal cache, wherein the temporal cache is configured to store data based on a topic associated with the data and is configured to retrieve data based on a topic, and wherein the data cache is configured to retrieve data responsive to at least one query from the computer system
Emotional Influence Prediction of News Posts
Nowadays, on-line news agents post news articles on social media platforms with the aim to spread information as well as to attract more users and understand their reactions and opinions. Predicting the emotional influence of news on users is very important not only for news agents but also for users, who can filter out news articles based on the reactions they trigger. In this paper, we focus on the problem of emotional influence prediction of a news post on users before publication. For the prediction, we explore a range of textual and semantic features derived from the content of the posts. Our results show that terms is the most important feature and that features extracted from news posts' content allow to effectively predict the amount of emotional reactions triggered by a news post